User:John Cummings/Archive/Dataimporthub

Data import hub

This page is a hub to organise importing data from external sources.

To request a data import please see the section below, the basic process of a dataset being imported is:

Dataset import is requested
The data import is planned and formatted by the community
The data is imported through a bot request

You may find these related resources helpful:

Why import data into Wikidata.
Learn how to import data
Bot requests
Ask a data import question

Request a data import

Create an account by clicking Create an account in the top right hand corner of the page.
Enable email user (this will allow Wikidata users to email you to notify you about discussion about the dataset)
Click New Section at the top of this page
Add the name of the dataset in the Subject field
In the text box complete the the following fields:
1. Name of dataset: The name of the dataset
2. Source: The source of the dataset
3. Link: A link to the data if it is available online
4. Description: A description of the data including any details that may not be clear from the online source
5. Request by: Sign your name using ~~~~

Instructions for data importers

Please copy and paste this table and discussion subheading below the request to keep track of the stage of the import. Please include notes on all steps of the process, instructions for doing so can be found ??here??.

Workflow

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Manual work needed:	Date complete: Notes:

Discussion:

Imported data sets

Please click here for a list of previously imported data sets

MIS Quarterly Articles information

Name of dataset: MIS Quarterly Articles information Source: http://www.misq.org/roles/ Link: http://www.misq.org/roles/ Description: MISQ is the highest impact factor journal in Information Systems. I would like to import its articels information on wikidata Request by: Mahdimoqri (talk) 22:13, 12 January 2017 (UTC)mahdimoqri[reply]

SCOGS (Select Committee on GRAS Substances), Generally recognised as safe database

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Importing data into Wikidata	Date import complete and notes
Name: SCOGS (Select Committee on GRAS Substances), Generally recognised as safe. Source: FDA Link: https://www.accessdata.fda.gov/scripts/fdcc/cfc/XMLService.cfc?method=downloadxls&set=SCOGS Description: FDA Allowed dietary supplements.	Link: Done: https://docs.google.com/spreadsheets/d/1-6PkozVUm_8dKxPDqs8M71Me0oNA4m-h8UjY1qZkheU/edit?usp=sharing To do: Notes: Added fields : Wikidata entity number chemical formula (from wikipedia) usage : packaging, filtering, none other name	Structure: or each item : add : instance of (P31) :instance of Generally recognized as safe generally recognized as safe (Q1041116): United States government designation for food additives issued by : SCOGS SCOGS (Q31384742): SCOGS (Select Committee on GRAS Substances) Field : GRAS Substance : Identifier - Label SCOGS Report Number : SCOGS report number (Q31385009): no description SCOGS report number CAS Reg. No. or other ID Code : CAS Registry Number CAS Registry Number (Q102507): chemical identifier Year of Report : year year (Q577): estimated period of time for the Earth's orbit around the Sun and observed at a fixed geographic point (averaging 365.24 days); base later modified to define or adjust various calendars SCOGS Type of Conclusion: SCOGS Type of Conclusion (Q31385177): no description SCOGS Type of Conclusion SCOGS Type of Conclusion 1 (Q31440178): There is no evidence in the available information on [substance] that demonstrates, or suggests reasonable grounds to suspect, a hazard to the public when they are used at levels that are now current or might reasonably be expected in the future. Q31440184: no description Q31440248: no description Q31440249: no description SCOGS Type of Conclusion 5 (Q31440251): In view of the almost complete lack of biological studies, the Select Committee has insufficient data upon which to evaluate the safety of [substance] as a [intended use]. 21 CFR Regulation : section, verse, paragraph, or clause (P958)paragraph, or other kind of special indication to find information on a page or on a document (legal texts etc.) / Title 21 of the Code of Federal Regulations (Q7810053): US code governing food and drugs NTIS Accession Number : National Technical Reports Library (Q16977729): publications database of the U.S. National Technical Information Service , accession (Q567586): law Example item: Done: https://www.wikidata.org/wiki/Q132298 instance of Generally recognized as safe issue : 85 of : SCOGS report number of : SCOGS Type of Conclusion 1 point in time : 1976 laws applied Title 21 of the Code of Federal Regulations section, verse, or paragraph : 184.1631 National Technical Reports Library article ID : PB265507 To do: Cannot use new properties directly.	Done: Data formatted. To do: Notes:	Done: To do: Manual work needed:	Date complete: Notes:

Mdupont (talk) 23:59, 1 July 2017 (UTC)[reply]

Global disease burden data from IHME institute

Workflow

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Importing data into Wikidata	Date import complete and notes
Name: IHME Global Burden of Disease Study 2016 Source: Institute for Health Metrics and Evaluation (IHME), at the University of Washington Link: [1] Description: IHME produces global and country-specific estimates of disease burden (i.e. years of healthy lives lost due to death or disease). The estimates of disease burden for different diseases would be valuable in understanding their relative importance in the world. Property disease burden (P2854) can be used to link a disease to a respective estimate in DALYs.	Link: Google drive folder Google sheet for data Done: To do: Notes: The diseases should be linked to existing disease items in Wikidata. Is there a list of diseases per ICD10 code?	Structure: Column item in the data is the Wikidata item for the particular disease. Depending on the parameter, different properties and units (and possibly an additional qualifier) will be used. Mortality: Property: incidence (P2844), Unit: cases per 100000 person-years (Q23893296), of (P642): death (Q4). Numbers of deaths: Property: number of deaths (P1120), no unit. Numbers of DALYs: Property: disease burden (P2854), Unit: disability-adjusted life year (Q55627) DALY rate: Property: disease burden (P2854), Unit: DALY / 100000 population (Q46123112) The value of the claim comes from the column value (which is the best estimate; the upper and lower bounds of a 95 % confidence interval are available but they are probably not needed). There will be the following qualifiers: point in time (P585) 2016 (for all rows) location (P276) Earth (Q2) (for all rows in the global data) sex or gender (P21) either male (Q6581097) or female (Q6581072) depending on the row entry on column sex; if "Both", then this qualifier is omitted. There will be this reference for all rows: stated in (P248) Global burden of disease study 2016 (Q45750370) Example item: laryngeal cancer (Q852423) Done: To do:	Done: To do: Notes:	Done: To do: Manual work needed:	Date complete: Notes:

Discussion:

Notified participants of WikiProject Medicine

How do I actually link the names of the disease (in the data) to the disease items (in Wikidata)? --Jtuom (talk) 14:49, 20 December 2017 (UTC)[reply]

@Jtuom: I would write a seperate script that maps the names to the Wikidata IDs. It makes the process much more painless to first check if the mapping works well. --Tobias1984 (talk) 19:26, 20 December 2017 (UTC)[reply]

@Tobias1984: Thanks for the advice. However, I assume that when you say "I would write" you don't mean that you would actually write such script yourself. I'd love to do it but I don't know how. So far, I have managed to create my first SPARQL script by imitating existing scripts [2]. However, the sensitivity and specificity of that script is very poor and it cannot be used to map the diseases I need for this data import. I'd like to try a script that takes each disease name from my data and searches for that from Wikipedia and returns the respective item number from Wikidata -- but I have no idea how that could be done. There are maybe 180 diseases on the list, so it could be done in half a day by hand, but there are probably better solutions. Can someone help? --Jtuom (talk) 13:25, 22 December 2017 (UTC)[reply]

Import a spreadsheet myself?

Hello, I've prepared a spreadsheet to import the names of the winners of the tennis Swiss Open from 2000 to 2017. I see this as a test before I start importing more sports data. Is there a way I can import this file myself or do I need to use the Import Hub ? Here is the file for your review: https://docs.google.com/spreadsheets/d/1sTwCwyo6n-xPlWjk3xT2DmKUoYKOxkjpHsa6-0_kYIM/edit?usp=sharing Wallerstein-WD (talk) 22:21, 30 May 2018 (UTC)[reply]

TheRightDoctors

Name of dataset: TheRightDoctors Source: Internet Link: www.therightdoctors.com Description: Insights from the World's Best Medical Minds. Connect With Us to Connect With Them. We are a Digital Health Google LaunchPad Start-up. Request by: Dr. Chandra Shekar

https://www.dropbox.com/s/oy4bdvtq6dav7b5/books_at_moma.xlsx?dl=0

Liste historische Kantonsräte des Kantons Zürich

Workflow

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Importing data into Wikidata	Date import complete and notes
Name: Mitglieder des Kantonsrats des Kantons Zürich Source: Kanton Zürich, Direktion der Justiz und des Innern, Wahlen & Abstimmungen: https://wahlen-abstimmungen.zh.ch/internet/justiz_inneres/wahlen-abstimmungen/de/wahlen/krdaten_staatsarchiv/datenexporthinweise.html Link: https://www.web.statistik.zh.ch:8443/KRRR/app?show_page=EXCEL&operation=EXCEL Description: A semi-structured list of members of the "Kantonsrat" (legislative of the canton of Zurich), Switzerland. Starting in 1917 to the present day.	Link: [3] Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Manual work needed:	Date complete: Notes:

Thist uzh (talk) 07:12, 2 August 2018 (UTC)[reply]

Kerala Flood Data

Hi Team,

I would like to upload the verified and validated Data related to Kerala Flood.

Censo-guía of Archives

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Importing data into Wikidata	Date import complete and notes
Name: Censo-guía de Archivos de España e Iberoamérica Source: Censo-guía de Archivos de España e Iberoamérica Link: Directorio - Property ID in Wikidata (Censo-guía is an authority control for WD). Description: The Censo-guía de Archivos de España e Iberoamérica was created by Law 16/1985 (25th June), Law of "Patrimonio Histórico Español". In its article 51 determines that "la Administración del Estado, en colaboración con las demás Administraciones competentes, confeccionará el Censo de los bienes integrantes del Patrimonio documental". The censo-guía was later expanded to include institutions from Iberoamerica. The Censo Guía functions as a control tool and a communications tool about the archives that exist in Iberoamerica.	Link: History; Overview of the Censo-Guía content Done: To do: Notes:	Structure: Fields used for the spreadsheet can be found here; this can be expanded to be run througout the 44k XML entries and the XML schema of the censo-guía can be found here (overview) and here (schema). Example item: Done: To do:	Done: To do: Notes: The spreadsheet with the total registries can be found here.	Done: To do: Manual work needed: I'm not sure if attributes (?): repositorarea; lengthshelf & repositorycode exist in Wikidata. Repository code is quite an important one.	Date complete: Notes:

Discussion

To clarify, this is the first time I'm trying to do such an import. I've downloaded the around 45k registries from the censo-guia in XML format and a friend helped me to convert the XML format into an csv file. I can iterate over those 45k registries to include any other information that might be relevant according to the schema (notice, however, that they don't necessarily have all the fields completed in the XML files). I'm also able to work on improving the data that's currently in the spreadsheet, like removing "()", changing the names of the archives that have uppercase, and so on. But I'd welcome any instructions on how to improve this dataset so it can be succesfully imported into Wikidata. Scann (talk) 16:22, 26 August 2018 (UTC)[reply]

Adresse et géolocalisation des établissements d'enseignement du premier et second degrés

Name of dataset: Adresse et géolocalisation des établissements d'enseignement du premier et second degrés
Source: Éducation Nationale de la république française
Link: https://www.data.gouv.fr/fr/datasets/adresse-et-geolocalisation-des-etablissements-denseignement-du-premier-et-second-degres/
Description: Liste géolocalisée des établissements d'enseignement des premier et second degrés, des structures administratives de l'éducation du ministère de l'éducation nationale. Secteurs public et privé.
Request by: Psychoslave (talk) 12:04, 10 September 2018 (UTC)[reply]

Workflow

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Importing data into Wikidata	Date import complete and notes
Name of dataset: Adresse et géolocalisation des établissements d'enseignement du premier et second degrés Source: Éducation Nationale de la république française Link: https://www.data.gouv.fr/fr/datasets/adresse-et-geolocalisation-des-etablissements-denseignement-du-premier-et-second-degres/ Description: Liste géolocalisée des établissements d'enseignement des premier et second degrés, des structures administratives de l'éducation du ministère de l'éducation nationale. Secteurs public et privé.	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Manual work needed:	Date complete: Notes:

Discussion:

The cat and fiddle clock Hobart Tasmania Australia

Modern electronics and an old English melody brought this nursery rhyme to life. This focal piece of the cat and fiddle arcade was constructed by Gregory Weeding (a talented young ambitious local who had studied electronics in Melbourne) Charles Davis owner of a department store of the same name had decided to have an arcade and fountain and felt it needed a clock.

The melody played by a glockenspiel and vibraphone, was recorded in Melbourne and the musicians had to keep playing it again and again, until they took thirty seconds exactly – the time taken by the animated rhyme, with its cat, fiddle, dog, dish, spoon and cow to run its cycle. The clock strikes the hour and – hey, diddle, diddle – the children stand entranced as the cow jumps over the moon. It happens every day at the Cat & Fiddle Square on the hour from 8 am to 11 pm 7 days a week. In sequence the cat plays his fiddle, the cow jumps over the moon, the little dog laughs and there is a cheeky cameo by the dish and spoon. It has bought pleasure to onlookers since 1962. https://m.youtube.com/watch?v=maeZndy7g8c

SSLF city & housing

A real estate company in ekkatuthangal,chennai. The company was registered in TNRERA and got star category from ISO 9001:2015. The only real estate company in Chennai got authorized trademark from the central government of India. The company started at October 3rd,2007 by Dr.G.Sakthivel. It holds the name which the company has the largest site in tamil nadu.

User:John Cummings/Archive/Dataimporthub

Contents

Request a data import

Instructions for data importers

Workflow

Discussion:

Imported data sets

MIS Quarterly Articles information

SCOGS (Select Committee on GRAS Substances), Generally recognised as safe database

Global disease burden data from IHME institute

Workflow

Discussion:

Import a spreadsheet myself?

TheRightDoctors

Liste historische Kantonsräte des Kantons Zürich

Workflow

Kerala Flood Data

Censo-guía of Archives

Discussion

Adresse et géolocalisation des établissements d'enseignement du premier et second degrés

Workflow

Discussion:

The cat and fiddle clock Hobart Tasmania Australia

SSLF city & housing

Navigation menu

User:John Cummings/Archive/Dataimporthub

Request a data import

Instructions for data importers

Workflow

Discussion:

Imported data sets

MIS Quarterly Articles information

SCOGS (Select Committee on GRAS Substances), Generally recognised as safe database

Global disease burden data from IHME institute

Workflow

Discussion:

Import a spreadsheet myself?

TheRightDoctors

Liste historische Kantonsräte des Kantons Zürich

Workflow

Kerala Flood Data

Censo-guía of Archives

Discussion

Adresse et géolocalisation des établissements d'enseignement du premier et second degrés

Workflow

Discussion:

The cat and fiddle clock Hobart Tasmania Australia

SSLF city & housing

Navigation menu

Search